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We investigate the concept of entropy in probabilistic theories more general than quantum mechan- 
ics, with particular reference to the notion of information causality recently proposed by Pawlowski 
et al. (arXiv:0905.2992 ). We consider two entropic quantities, which we term measurement and 
mixing entropy. In the context of classical and quantum theory, these coincide, being given by the 
Shannon and von Neumann entropies respectively; in general, however, they are very different. In 
particular, while measurement entropy is easily seen to be concave, mixing entropy need not be. In 
fact, as we show, mixing entropy is not concave whenever the state space is a non-simplicial poly- 
tope. Thus, the condition that measurement and mixing entropies coincide is a strong constraint 
on possible theories. We call theories with this property monoentropic. 

Measurement entropy is subadditive, but not in general strongly subadditive. Equivalently, if 
we define the mutual information between two systems A and B by the usual formula I{A : B) = 
H{A) -j- H[B) — H{AB) where H denotes the measurement entropy and AB is a non-signaling 
composite of A and B, then it can happen that I{A : BC) < I{A : B). This is relevant to information 
causality in the sense of Pawlowski et al.: we show that any monoentropic non-signaling theory in 
which measurement entropy is strongly subadditive, and also satisfies a version of the Holevo bound, 
is informationally causal, and on the other hand we observe that Popescu-Rohrlich boxes, which 
violate information causality, also violate strong subadditivity. We also explore the interplay between 
measurement and mixing entropy and various natural conditions on theories that arise in quantum 
axiomatics. 



I. INTRODUCTION 



One can view quantum mechanics as an extension of 
the classical probability calculus, allowing for random 
variables that are not simultaneously measurable. In 
order to gain a clearer understanding of quantum the- 
ory from this perspective, it is useful to contrast it with 
various (factitious) alternatives that are neither classical 
nor quantum. The best known example of such a "foil" 
probabilistic theory is probably the theory of "non-local 
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boxes" P, [HjH^I; but in fact, there is a standard math- 
ematical framework for such theories, going back to the 
work of Mackey in the 1950s [26^. Working in this frame- 
work, one can show that many phenomena commonly 
regarded as characteristically quantum - no-cloning and 
no-broadcasting theorems [2, y], the trade-off between 
state disturbance and measurement [l[ , and the existence 
and basic properties of entangled states P, H, HI, |23| - are 
actually quite generic features of all non-classical prob- 
abilistic theories satisfying a basic "non-signaling" con- 
straint. Other quantum phenomena, such as the possi- 
bility of teleportation ^] or remote steering of ensembles 
0, are more special (and in some sense, more classical), 
but can still be seen to arise outside the boundaries of 
quantum theory. 

One might hope to find some reasonably short list 
of probabilistic or information-theoretic phenomena that 
more cleanly separate quantum theory from other pos- 
sible non-signaling theories. In a recent paper [27| . 
Pawlowski et al. take a step in this direction by 
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showing that any non-signahng correlation violating the 
Tsirel'son bound also violates a qualitative information- 
theoretic principle they call information causality (IC). In 
essence, this prohibits a form of "multiplexing" in which 
one party (Bob) gains the ability to access a total of more 
than m bits of information held by another party (Alice), 
on the basis of an m-bit message from Alice, plus some 
shared non-signaling bipartite state. It is also established 
in [131 that quantum mechanics - and hence, also classi- 
cal probability theory - satisfies this IC constraint. 

In establishing that quantum mechanics satisfies IC, 
Pawlowski et al. make use only of standard formal prop- 
erties of the von Neumann entropy of joint quantum 
states. This raises the obvious question of where their 
proof breaks down in other contexts (e.g., a PR box) in 
which IC fails. In order to address this question, we de- 
velop some of the basic machinery of entropy, conditional 
entropy and mutual information in a very general proba- 
bilistic setting — an independently interesting problem, 
which seems not to have received much previous atten- 
tion (an exception being the paper [l^l of Hein). 

We begin by identifying two notions of entropy, which 
we call measurement and mixing entropy, and which we 
denote respectively by H{A) and S{A), where A is a 
general probabilistic model. Briefly: the measurement 
entropy of a system is the minimum Shannon entropy 
of any possible measurement thereon, while the mixing 
entropy is the infimum of the Shannon entropies of the 
various ways of preparing the system's state as a mixture 
of pure states. These coincide classically and in quantum 
theory, but are generally quite different animals. For ex- 
ample, measurement entropy is always subadditive, and 
is concave; mixing entropy is generally neither. In fact, 
in Appendix A, we show that there are always violations 
of concavity of the mixing entropy for any system with 
a state space that is a non-simplicial polytope. Thus, 
the condition that mixing and measurement entropies do 
coincide, as in quantum mechanics, is a powerful con- 
straint on the structure of a probabilistic theory. We call 
theories with this feature monoentropic. 

Next, we develop an account of joint measurement 
entropy, conditional entropy, and mutual information 
for composite systems, and apply this apparatus to the 
notion of information causality given in (27l |. Some- 
what surprisingly, it seems that the main issue is not 
so much one of the strength of non-local correlations, 
but rather, the failure, of two other, very basic princi- 
ples. One is the strong subadditivity, or, equivalently, 
the condition that the mutual information, defined by 
I{A : B) = H{A) + H{B) - H{AB), satisfy 

I{A : BC) > I{A : C). 

This holds both classically and in quantum theory, but is 
violated in very simple non-classical models - even mod- 
els in which A and B are classical, so that no issue of 



non-locality can arise. Another basic principle, equiva- 
lent to the Holevo bound, is that I{E : B) < I{A : B) 
where E is any particular measurement on system A. 

Both strong subadditivity and the Holevo bound can 
be viewed as special cases of an even more basic principle, 
usually called the data processing inequality. This asserts 
that, for any systems A,B and B' , and any reasonable 
process £ : B ^ B' , we have I{A : S{B)) < I{A : B) 
(where £{B) :— B' is the output system of the process). 
This is intuitively appealing as a basic physical postulate. 

Finally, we apply the apparatus just described to the 
notion of information causality. We consider in detail the 
basic example, due to van Dam [33| of an IC-violating 
composite system, and find that it exhibits a violation 
of strong subadditivity. We also establish that, within a 
very broad class of finite-dimensional monoentropic theo- 
ries, strong subadditivity together with the Holevo bound 
entail information causality. It remains an open question 
whether all three of these conditions are necessary for 
this conclusion. 

The remainder of this paper is organized as follows. In 
Section II, we review in some detail the framework of gen- 
eralized probability theory, largely following In sec- 
tion III, we define, and establish some elementary proper- 
ties of, measurement and mixing entropy for states of an 
arbitrary probabilistic model. Section IV discusses com- 
posite systems in our framework, and collects some obser- 
vations about the behavior of joint measurement entropy, 
and the notion of mutual information based on this. Us- 
ing this apparatus, we establish in Section V that any 
monoentropic probabilistic theory in which measurement 
entropy is strongly subadditive and satisfies the Holevo 
bound, is informationally causal in the sense of [l^. We 
also point out that violations of strong subadditivity 
are possible in theories having no entanglement. Sec- 
tion VI collects some final remarks and open questions. 
Appendix A contains the proof that mixing entropy is 
not concave on state spaces that are non-simplicial poly- 
topes. Appendix B establishes some further properties 
of monoentropic theories, relevant to axiomatic charac- 
terizations of quantum theories, and also shows that mo- 
noentropicity follows from two other properties, steering 
and pure conditioning, the physical content of which may 
be more transparent. Finally in Appendix [Q we discuss 
how the framework of this paper relates to the "convex 
sets" framework, and consider analogous definitions of 
measurement entropy in that context. 



II. GENERAL PROBABILISTIC MODELS 



As we mentioned above, there is a more or less 
standard mathematical framework for discussing general 
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probabilistic models, going back at least to the work of 
Mackey in the 1950s, and further developed (or, in some 
cases, rediscovered) in succeeding decades by various au- 
thors [E [H, 111 [il, [H, ii]. In what follows, we work in 
the idiom of [8|, which we briefly recall. 

We characterize a probabilistic model, or, more briefly, 
a system, by a pair A = (21, ft) where 21 is a collection 
- possibly infinite - of discrete classical experiments or 
measurements, and is a set of states. We make the 
following assumptions: 

(i) Every experiment in 2t is defined by its set of pos- 
sible outcomes, so that we may represent 21, math- 
ematically, as a collection of sets E,F,.... In the 
language of [IB, [sH], this is a test space; accord- 
ingly, we refer to the various sets E,F, ... G 21 as 
tests. 

(ii) Every state is entirely determined by the probabil- 
ities it assigns to the outcomes of the various mea- 
surements in 21. Thus, letting X := 1J21 denote 
the total outcome space of 21, Q consists of func- 
tions a : X ^ [0) l]i with J2xeE o^i^) = 1 for every 
set e 21. 

(iii) The state space $7 is a convex subset of [0, 1]"^ (the 
functions from X to [0,1]). Hence any statistical 
mixture of states is a state. 

For a given test space 21, one can define the space of all 
states on 2t. This is called the maximal state space and 
is denoted by $1(21). It is clearly convex. The physical 
state space fl is necessarily either equal to or a subset of 
the maximal state space. 

This framework, though very simple, is broad enough 
to accommodate both measure-theoretic classical prob- 
ability theory and non-commutative probability the- 
ory based on von Neumann algebras. [40l| In this pa- 
per, we shall be interested exclusively in discrete, finite- 
dimensional systems. Accordingly, from this point for- 
ward, we make the standing assumptions that (i) 21 is 
locally finite, meaning that all tests -E G 2t are finite sets 
(4l| . and (ii) ft is finite dimensional and closed. 

As is easily checked, local finiteness guarantees that the 
maximal state space fl(2t) is compact; thus, the closed- 
ness of the physical statespace ft insures that it, too, 
is compact. [42i] It follows that every state can be repre- 
sented as a finite convex combination, or mixture, of pure 
states, that is, extreme points of Q. 

We now consider several examples. For us, a classical 
system corresponds to a pair {{E}, A{E)) where the test 
space {E} consists of a single measurement and where 
A(£') denotes the entire simplex of probability weights 
on E. In other words, there is just one test and any 



probability distribution over the outcomes is a possible 
state. A quantum system corresponds to (5^(H), r2(H)), 
where 5'(H) is the set of (unordered) orthonormal bases 
on a complex Hilbert space H and ri(H) is the set of 
density operators. [isj 

A simple example that is neither classical nor quantum, 
and to which we shall refer often, is the "two-bit" test 
space 2t2 = {{a, a'}, {b, b'}}, consisting of a pair of two- 
outcome tests, depicted in Fig. [TJ The full state space 
f2(2l2) is isomorphic to the unit square [0,1]^ under the 
map a i— > {a{a),a{b)) and is depicted if Fig. [51 Accord- 
ingly, we shall call a system of this form a square bit or 
squit. A PR box is a particular entangled state of two 
squits, as discussed below in Section IVAI 

a b 
9 9 
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a' b' 

FIG. 1: The "two-bit" test space %2 = {{a,a'},{b,b'}}. It 
is depicted using a Greechie diagram, wherein vertices denote 
outcomes and every smooth line through a set of vertices rep- 
resents a test. 
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FIG. 2: The squit state space n(Sl2). The pure state ai yields 
the outcome a in test {a, a'} and the outcome 6 in {b,b'}, 
that is, ai{a) — 1, ai(a') = 0, and ai{b) — 1, ai(b') = 0. 
Similarly, 02, ces and 0:4 yield the pairs of outcomes a, b' , a , b 
and a' ,b' respectively. 



III. MEASUREMENT AND MIXING 
ENTROPIES 



Let H be a finite-dimensional Hilbert space, represent- 
ing a quantum system. The von Neumann entropy of a 
state p on this system is defined as — Tr(plogp), where 
here and elsewhere, logarithms have base 2. Equivalently, 
it is the Shannon entropy of the coefficients in the 
spectral decomposition p = \Pi (where the Pi are 
p's rank-one eigenprojections). In effect, the spectral de- 
composition privileges a particular convex decomposition 
of the state, and (up to phases) a privileged test in S^(H). 
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In our much more general setting, where we have nothing 
hke a spectral theorem, how might we define the entropy 
of a state? The following definitions suggest themselves. 

Definition 1 Let a be a state on 21. For each test S 21, 
define the local measurement entropy of a at E, HE{a), 
to he the classical (Shannon) entropy of q\e, i-e., 

Hsia) := - ^ a{x) log(Q!(a;)). 

xeE 

The measurement entropy of a, H{q), is the infimum of 
He{q) as E ranges over 21, i.e., 

H(a) := inf 

Note that the measurement entropy of a state oi A = 
(2t, ri) depends entirely on the structure of 21, and is in- 
dependent of the choice of state space VI. It will often be 
convenient to write H{a) as H{A), where context makes 
clear which state is being considered. 

For the remainder of this paper we make, and shall 
make free use of, the assumption that the measurement 
entropy of a state is actually achieved on some test, i.e., 
that H{a) — HE{a) for some _E G 21. This is the case in 
quantum theory, and can be shown to hold much more 
generally, given some rather weak analytic requirements 
on an abstract model (21, f2) - for details, see Appendix IB] 
It follows that H{a) = if and only if there is a test such 
that a assigns probability 1 to one of its outcomes. 

Definition 2 Let a he a state on 21. The mixing ( or 
preparation j entropy for a , denoted S (a), is the infimum 
of the classical (Shannon) entropy ...,p„) over all 

finite convex decompositions a = '^iPiOti with Ui pure. 

Again, we write S{A) for S{a) where a belongs to the 
state space of a system A — (2t, 17). In contrast to 
measurement entropy, the mixing entropy of a state de- 
pends only on the geometry of the state space f2, and is 
independent of the choice of test space 21. The mixing 
entropy of a pure state is 0. 

Trivially, in classical probability theory, measurement 
and mixing entropies coincide, both being simply the 
Shannon entropy. Much less trivially, measurement and 
mixing entropies also coincide in quantum theory, where 
they equal the von Neumann entropy. 44] As the follow- 
ing example shows, however, measurement and mixing 
entropies can be quite different. 

Example 1 (The firefly model [45] ) Let 21 — 
{{a, a;, 6}, {&, y, c}, {c, z, a}}. This test space is 
depicted in Fig. O One can check that 17(21) 



has five pure states, one of which is given hy 
a{a) = a{b) ~ a{c) — 1/2, a{x) — a{y) ~ ot(z) — 0: 
since this is pure, S{a) — 0, yet H{a) — 1. On the 
other hand, consider the pure states f3 and 7 determined 
hy f3{b) = (3{z) = 1 and 7(0;) = 7(2/) = 7(2) = 1: 
their average, lo 1/2/3 + I/27 has mixing entropy 
S{lu) — 1. This follows from the fact that the only 
convex decomposition of lo into pure states is into (3 and 
7, which in turn follows from the fact that these are the 
only pure states that assign prohahility one to z. On the 
other hand, lo{z) = 1, so H{uj) = 0. 



a 




FIG. 3: The Greechie diagram for the test space of the firefly 
model. 



Even in the general case, measurement entropy is quite 
well behaved. For example, it is easy to see that H{a) is 
continous as a function of a. Further, 

Theorem 1 Measurement entropy is concave, i.e., if 
tiUi is a convex combination of states ai on A, then 

H >^i»i?(aO- (1) 

Proof. Since for each test E the local entropy He is 
concave, 

H^Y.^^'^i^ = '^^He (^t,a}j 

> mfJ2UHEia,) > J2UHia,). 

i i 

□ 

Mixing entropy is, by contrast, a curious beast. The 
following example shows that it need not be continuous 
as a function of the state. 

Example 2 Let fl CM."^ he the convex hull of the circle 
C — {{x,y,0)\x'^ -|- = 1} and the line segment L = 
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{(1, 0, t)| — 1 < i < 1} (Figure^. Let a denote the point 
of intersection of I and C , i.e., the point (1,0,0). The 
extreme points of this set are evidently the endpoints of 
I, together with the points of C \ {a}. Note that a has 
a unique decomposition as a mixture of extreme points 
offl, namely, as an equal mixture of the endpoints of I . 
Thus, S{a) = 1. On the other hand, a can be approached 
as closely as we like by extreme points belonging to C \ 
{a}, which have mixing entropy 0. The mixing entropy 
is therefore discontinuous at a. 




I 

FIG. 4; Example of a state space SI for which mixing entropy 
is not everywhere continuous (see Example 



Example 3 Let i7 be a square. Let a and /3 be the mid- 
points of adjacent faces, noting that these each have unit 
mixing entropy, S{a) = S{(3) = 1. Let 7 = l/2(a + (3) 
be the mid-point of the hne segment between a and (3, 
and note that it also lies on the line segment between 
antipodal vertices of Q (the diagonal through the square 
between the chosen faces). But given that 7 is not at 
the midpoint of this diagonal, the Shannon entropy for 
the associated convex decomposition is less than one, 
as is therefore the infimum over convex decompositions. 
Therefore, the mixing entropy for 7 satisfies S{'j) < 1. 
Consequently, 5(7) < l/2S{a) + 1/2S{P), and we have 
a failure of concavity of the mixing entropy. 




P 

FIG. 5: Failure of concavity of mixing entropy for a squit. 



In fact, the failure of concavity for the mixing entropy 
is quite generic. 



Theorem 2 Mixing entropy is not concave whenever the 
state space is a non-simplicial polytope. 

The proof is given in Appendix It follows that an 
assumption of concavity for the mixing entropy forces 
the state space to be either a simplex (i.e. classical) or 
not a polytope. Hence such an assumption or one that 
implies it may be a useful tool in axiomatizing quantum 
theory. 

It is natural to ask what follows from the condition 
that, as in classical and quantum theories, measurement 
and mixing entropy coincide. One immediate conse- 
quence is that mixing entropy will be concave. In view 
of Theorem [51 this implies that either the system is es- 
sentially classical, or there are an infinite number of pure 
states. Hence equality of measurement and mixing en- 
tropies narrows down possible theories quite a lot. We 
discuss this matter further in Appendix [Bl 

Both measurement and mixin g en tropy have been con- 
sidered before, notably by Hein [20j, in a similar context, 
albeit with somewhat different aims than ours in view. 
There are various other entropic quantities one could rea- 
sonably consider. For example, a concept of entropy that 
might be more closely related to operational tasks is the 
supremum, over convex decompositions of the state and 
over tests, of the classical mutual information between 
the random variable specifying the element of the con- 
vex decomposition, and the random outcome of the test. 
Natural analogues of this quantity and of the measure- 
ment and preparation entropies defined above exist in 
the closely related ordered linear spaces framework (also 
known as the convex sets framework) for theories. Test 
space models such as we have defined above induce or- 
dered linear spaces models by a linearization procedure 
that embeds the test space in a vector space and identi- 
fies outcomes in the test space with certain elements of 
the dual vector space; this procedure allows one to define 
concepts of measurement entropy more tightly related to 
the geometry of the state space, but that can usually be 
viewed as special cases of the test space definition. Ap- 
pendix [C] gives a further brief discussion of this. 

From this point on, we focus mainly on measurement 
entropy. As always with mathematical definitions, there 
is a certain tension between the ideals of flexibility and 
generality, on the one hand, and, on the other, the desire 
to avoid annoying pathologies. Our test-space dependent 
definition of measurement entropy definitely errs on the 
side of the former, in that it is consistent with quite ab- 
surd examples. For example, if one includes in one's test 
space a test having a single outcome, then all states will 
automatically have zero entropy. One can avoid such dif- 
ficulties by placing various restrictions on the test spaces 
to be considered, at the cost of a slightly more involved 
technical development. Going to the linearized setting 
mentioned above may also help. Our work in this paper 
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does not demand such fastidiousness, however, as our re- 
sults are of a very general character. 



IV. COMPOSITE SYSTEMS AND JOINT 
ENTROPY 



Most of the interesting problems of information theory 
involve more than one system. The following subsection 
describes how to treat composite systems in the language 
of test spaces. The idea is that, given systems A and B, 
the joint system AB should be associated with a test 
space and state space of its own. However, there is not 
a unique recipe for determining test and state spaces for 
AB given the test and state spaces for A and B. In- 
stead, a theory must give additional rules that specify 
how systems combine. [46] Our results will pertain to a 
variety of notions of composition, although we limit the 
scope by requiring certain properties to hold. In par- 
ticular, we assume that the test space of the composite 
includes all product tests and conditional two-stage tests 
(where one party's choice of test is conditioned on the 
outcome of the other party's test). One motivation for 
this is to have a test space that is sufficiently rich to be 
interesting. Another motivation is that this assumption 
guarantees that all states are non-signaling. We go on to 
define analogues of familiar quantities, such as joint en- 
tropies and the mutual information, which are used later 
to analyze information causality. 



A. Composite Systems 



Consider two systems, A and B, where A = (2t, fl-^) 
and B = (S,ri^). For convenience, assume that these 
are controlled by two parties, called Alice and Bob. The 
first, and most basic, assumption we shall make is that 
Alice can perform any test G 21 simultaneously with 
Bob performing any test F € This can be regarded 
as a single product test. The possible outcomes of this 
product test are pairs of the form (e, f) E E x F. 

Definition 3 The Cartesian product of the test spaces 21 
and *B is the collection of all product tests. It is denoted 
21 X «B. 



The set n(2l x 03), of all states that can be defined 
on the Cartesian product test space, typically includes 
signaling states, which allow Alice to send messages in- 
stantaneously to Bob, or vice versa, by varying her choice 
of which test to perform. 



Definition 4 A state uo on 21 x 03 is non-signaling iff 
Y^uj^^ieJ) ^ Y^u^^ie'J) yf,E,E'. (2) 

eeE e'GE' 

If a state lu^^ is non-signaling, it is possible to define the 
marginal (or reduced) state cj"^ via 

c.^(e) = 5]c^(e,/), (3) 

where Eq. ^ ensures that the right hand side is inde- 
pendent of _F G 03. The marginal is defined similarly. 

If uj^^ is non-signaling, it is also possible to define 
a conditional state, ui^^^. Informally, this is the updated 
state at Bob's end following the outcome e being obtained 
for a test at Alice's end: 

u^\'^{f) ■.= u;^^{ej)/u;%). 

By convention, oj^l"^ is zero if uj^(e) is zero. The condi- 
tional state w'^l-'^ is defined similarly. 

Notice that a particular type of measurement, which 
might be thought reasonable, is not included in the 
Cartesian product. This is a joint measurement, where 
Alice first measures her system, and communicates the 
result to Bob, who performs a measurement which de- 
pends on Alice's outcome. Entangled measurements, 
such as are allowed in quantum theory, are also not in- 
cluded. Hence the Cartesian product 21 x 03 models a 
situation in which Alice and Bob are fairly limited - they 
can act independently and collate the results of their ac- 
tions at a later time, but cannot otherwise communicate. 

It is possible to construct a more sophisticated prod- 
uct of two test spaces, which does allow for the kind of 
two stage measurements just described (although still not 

entangled measurements). Let 21 !B denote the test space 
consisting of the following 

1. All two-stage tests, where a test G 2t is per- 
formed, and then, depending on the outcome e 
that is obtained, a pre-selected test G 03 is per- 
formed. 

2. All two-stage tests, where a test G 03 is per- 
formed, and then, depending on the outcome / 
that is obtained, a pre-selected test Ef G 21 is per- 
formed. 

2103 is called the Foulis- Randall or bilateral product of 
test spaces 21 and 03. 
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The Foulis-Randall product contains the Cartesian 

product, 21 X 05 C215B, because product tests are a spe- 
cial case of two-stage tests. Furthermore, if either Aoi B 
is non-classical, then not all two-stage tests are product 
tests, so that the containment is strict. The contain- 
ment of one test space in another has consequences for 
their state spaces. Specifically, if X and 2) are test spaces 
such that X C 2), then the convex set ri(2}) may be in a 
higher dimensional space than r2(X), but the restrictions 
of states in ri(2)) to X (which are well-defined, since ev- 
ery test in X is also a test in 2}) are all contained in 
r2(X). In other words, writing il(2))|je for the set of re- 
strictions to X of states on 2)i we have r2(2})|3t Q ^^(X). 
Because the additional measurements in 2) place addi- 
tional constraints on these states, the containment may 
well be strict. 

It follows that the restriction of the maximal state 
space of the Foulis-Randall product to the Cartesian 
product is contained within the maximal state space of 

the Cartesian product, f^(2l«8)|ax<B ^ f^(2t x <B). The 
containment is strict if one of the systems is non-classical. 

Indeed, the states in ri(2lS)|ax'8 correspond exactly to 
the non-signaling states in ri(2l x 05). This is demon- 
strated in Ref. 

We are now prepared to define the class of test and 
state spaces for composites in which we shall be inter- 
ested. The test space for the composite, which we denote 
by £, is required to contain the Foulis-Randall product 

of the components, 2105 C £. The state space of the com- 
posite, which we denote by fl^^ , is unconstrained beyond 
being a subset of the maximal state space, V,"^^ C fl{'t). 

Recalling that 2l«BC £ implies Q f^(2tS) and 

that all the states in il(2l5B) are non-signalling, it fol- 
lows that all states in fl"^^ are non-signalling. Indeed, 
the main motivation for confining our attention to test 

spaces containing 2105 is that this is sufficient to ensure 
no-signalling without any further constraints on the state 
space. 

Given a state oj^^ G fi^^, the marginals uj'^, uj-^ , 
and conditionals of the form ui^^^ , w^l'^, are defined in 
the obvious way by the probabilities which uj^^ assigns 
to the product tests. Furthermore, we assume that the 
composite systems we consider satisfy the following nat- 
ural requirement: that if a test is performed on system 
A, the conditional state on system B must be allowed 
in the theory, i.e., be contained in fl^ , and vice versa. 
Hence fl^^ satisfies the constraint that for all e and / 
such that w^(e), uj^{f ) ^ 0, uj^^" and tj^l^ belong to 
and respectively. This is enough to ensure that the 
marginal states uj^, ui^ also belong to the state spaces 
of the component systems. 



A general composite test space € may contain non- 
product measurements, which are not contained in the 
Foulis-Randall product. Quantum theory, for instance, 
has a test space for composites that is larger than the 
Foulis-Randall product. If A and B are quantum sys- 
tems, so that A = (S-(H), 17(H)) and B = (^(K), n{K)), 
then the quantum joint system is AB :— (5^(H (X> 
K), i7(H(X)K)), which is a composite in our sense and con- 
tains non-product measurement outcomes, for instance, 
entangled ones. 

Henceforth, AB will stand for a general non-signaling 
composite of systems A and B. In the particular case 
where A = {{E},A{E)) is a classical system, we always 

take € to be the Foufis-Randall product {£'}05 . We 
also assume that composition of systems is associative, 
so that for any three systems A, B and C, there is a 
natural isomorphism A{BC) ~ {AB)C. 

In addition to specifying how systems combine, a prob- 
abilistic theory must specify what sorts of systems are al- 
lowed. For instance, in finite-dimensional quantum the- 
ory, every dimensionality of Hilbert space defines a differ- 
ent type of system and they are all allowed. Furthermore, 
a classical system of arbitrary dimensionality (that is, 
arbitrary cardinality for the test) can be defined within 
quantum theory as a restriction upon a quantum system 
of the same dimensionality, so in this sense classical sys- 
tems are allowed as well. A probabilistic theory must 
specify the types of systems that are allowed and how 
these compose. We shall confine our attention to the- 
ories incorporating only finite-dimensional systems, and 
those that contain, for any finite set E, the classical sys- 
tem {E,A{E)). (Thus, for us, quantum theory means 
finite-dimensional quantum theory in conjunction with 
classical systems.) For a discussion of what such theories 
might look like in category-theoretic terms, see [ol [l0|. 



B. Joint Entropies, Conditional Entropies, Mutual 
Information 



Consider a composite system AB — (£, il"^-^). The 
measurement entropy H{uj^^) of a state lo^^ G fi"*^, 
which we will sometimes denote by H{AB), is the infi- 
mum over i? £ £ of He{i-o^^)- In this context, it will 
also be understood that H{A) and H{B) stand for the 
entropies H{lo^) and H{lo^) of the marginal states uj^ 
and uj^ . 



Theorem 3 Measurement entropy is subadditive. That 
is, for any composite AB, 



H{AB) < H{A)+H{B). 
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Proof. Let uj be the joint state of AB, with marginal 
states and uj^ . Choose E and F with H{uj^) = 
and H{uj^) = Hf{uj^). By the definition of 
measurement entropy, the definition of a composite, and 
the subadditivity of Shannon entropy, we have H{AB) < 
Hef{lo) < He{oj^) + Hf{lo^) = H{A) + H{B). □ 



Definition 5 The conditional measurement entropy be- 
tween A and B is defined to be 



H{A\B) H{AB) - H{B). 



(4) 



Our notation here is less precise than it might be, since 
the joint entropy H{AB) depends on the test space as- 
sociated with the joint system, hence so do conditional 
entropies. We will try to be clear, at any point where the 
question could arise, as to what product is in play. 

Classically, given a joint distribution lo^^ over vari- 
ables A and B, one defines the mutual information by 



I{A : B) = H{A) + H{B) - H{AB), 



(5) 



where H denotes the Shannon entropy. One can regard 
this as a measure of how far A and B are from be- 
ing independent: by subadditivity, I{A B) > 0, with 
I{A : B) = iS A and B are independent, i.e., oj^^ fac- 
torizes. In attempting to extend the concept of mutual 
information to more general models, one might very nat- 
urally consider defining I{A : B) to be the maximum of 
the mutual informations I{E : F) as E and F range over 
tests belonging to systems A and B, respectively. How- 
ever, the usual practice in quantum theory is simply to 
take Equation ([5]), with von Neumann entropies replac- 
ing Shannon entropies, as defining mutual information. 
In general, this gives a different value. In order to fa- 
cilitate comparison with quantum theory, we shall adopt 
the following 

Definition 6 Let AB be a composite system. The 
measurement-entropy-based mutual information between 
A and B is 



liA : B) := H{A) + H{B) - H{AB). 



(6) 



With this definition, the subadditivity of measurement 
entropy (Theorem [3]) implies that measurement-entropy- 
based mutual information is non-negative. Hereafter, we 
will refer to this simply as the "mutual information" . 
Note that Eq. ([5]) is a special case of this definition. 

Now intuitively, one might expect that the mutual in- 
formation I{A : B) between two systems should not de- 
crease if we recognize that B is a part of some larger 
composite system BC - i.e., that I {A : B) < I [A : BC). 
Simple algebraic manipulations (using Eqs. ^ and 
allow us to reformulate this condition in various ways. 



Lemma 1 The following are equivalent: 

(a) I{A : BC) > I{A : B) 

(b) H{A\BC) < H{A\B) 

(c) H{A, B) + H{B, C) - H{B) < H{A, B, C) 

(d) I{A : B\C) > 0, where I{A : B\C) ^ H{A\C) 
H{B\C)-H[AB\C). 



Definition 7 The measurement entropy is said to be 
strongly subadditive if it satisfies the equivalent condi- 
tions (a)-(d). 



(We use this terminology despite the fact that it is usu- 
ally only condition (c) that goes by the name of "strong 
subadditivity" and despite the fact that conditions (a) 
and (d) constrain the measurement entropy only through 
the definitions of I{A : BC) and /(A : B\C).) A prob- 
abilistic theory in which conditions (a)-(d) are satisfied 
for all systems A,B and C will also be called strongly 
subadditive. 

Both the Shannon and von Neumann entropies are 
strongly subadditive. In the former case, this is a 
straightforward exercise, but in the latter, a relatively 
deep fact. Colloquially, this means that in classical and 
quantum theories, just forgetting about or discarding a 
system C never increases one's mutual information be- 
tween systems A and B. As the following shows, how- 
ever, strong subadditivity can fail in general theories, 
even when two of the three systems are classical. One po- 
tential gloss is that discarding or forgetting about system 
C can increase the mutual information between systems 
A and B. But a more sensible reading is perhaps that 
the quantity defined as mutual information should not in 
the general case be interpreted as "the information one 
system contains about another." 



00, 
01, 
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FIG. 6: The component test spaces for example [l] 



Example 4 (Failure of strong subadditivity of the mea- 
surement entropy.) Consider a tripartite system ABC , 
where A and B are classical bits and C is a squit, with 
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test space {{e, e'}, {/, /'}}. Consider the joint state de- 
scribed by the following table: 





e 


e' 


/ 


/' 


00 


1/4 





1/4 





01 


1/4 








1/4 


10 





1/4 


1/4 





11 





1/4 





1/4 



In words, the outcome of test {e, e'} is perfectly correlated 
with A while the outcome of the test {/,/'} is perfectly 
correlated with B. It is easily verified that 

H{C) = H{AC) ^ H{BC) = 1. 

// all three systems are measured, with the test on C per- 
haps depending on the values of A and B, there are al- 
ways four distinct outcomes, each with probability 1/4. 
Hence H{ABC) = 2 and 

I{A:B\C) = H{AC)+H{BC)-H{C)~H{ABC) 
= 1 + 1-1-2== -1<0, 

which contradicts form (d) of strong subadditivity. 



Proof. The assertion that A and B are independent 
means that the joint state is uj"^-^ — uj^ (g) oj^ , i.e., that 
u}^\'^ =z uj^ for all e ^ E. By Lemma [21 we have 

H{AB) = H{A) + J2^^ie)Hiuj^^'') 

eeE 

= H{A) + [Y, c^^(e) ) H{B) = H{A) + H{B). 

□ 

Finally, strong subadditivity does hold in the special 
case that systems A and C in Lemma [T] are classical. 
Colloquially, discarding a classical system can never re- 
sult in an increase in the mutual information between a 
general system and another classical system. 

Lemma 3 Let A and C be classical. Then for any sys- 
tem B , 

H{A\BC) < H{A\B). 

Hence, the equivalent conditions of Lemma [7] are satis- 
fied. 



Note that the foregoing example is all but classical, 
depending not on any notion of entanglement or non- 
locality, but only on the fact that one can measure either, 
but never both, of {e, e'} and {/, /'}. 

This section concludes with some lemmas, which hold 
in the special case that one or more of the systems in the 
composite is classical. Some of these are useful later on. 

Lemma 2 Let ui^^ be a state on AB, where A is clas- 
sical. Then 

H{uj'^^)^H{Lu^) + J2^^ie)H{Lo^^'')- (7) 

eS-E 

The proof is straightforward. As a shorthand, when A is 
classical we might write 

H{AB) = H{A) + Yp{e)H{B\e). 



Corollary 1 If A is classical, then H{B\A) > for any 
system B. 

The proof is immediate from Eqs. ([4]) and ([7]). 

Corollary 2 If A is classical and independent of B, then 
H{AB) = H{A) + H{B). 



Proof. Let A ~ {E}, C = {G}, and let the joint state of 
ABC be uj^^'^' . Then the marginal state of BC satisfies 
Lu^'^'ifg) = uj'^ig) ■ where, for all <? S G 

B\g ^ ^^^(eg) Bleq 

By Lemma [21 we have 
H{A\BC) = H{ABC) - H{BC) 

-F(C)-^c.^(.9)iI(c^^l^). 

We can rewrite this as 

H{A\BC) = H{A\C) 

g€Ge£E 

-Y,u;^{g)H{^^\^). (8) 
y 

Since measurement entropy is concave, 
t^E ""^9) 

whence 

g geGeeE 
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0, 

Equation ([5]), gives the 
H{A\BC) < H{A\C). □ 



It follows that 



AC 



which 



eg] 



while 



combined with 
desired result that 



C. Data Processing and the Holevo Bound 



A fundamental result in quantum information theory, 
the Holevo bound, asserts that if Alice prepares a quan- 
tum state p = "YlixeEP^P^ f'^'" Bob, then, for any mea- 
surement F that Bob can make on his system. 



where x '■— H{p) 
Holevo quantity). 



I{E ■.F)<x, 

- J^xesPx^iP^) (often called the 



This inequality makes sense in our more general set- 
ting. Suppose that Alice has a classical system A — 
{{E},A{E)) and Bob a general system B. Alice's sys- 
tem is to serve as a record of which state of B she pre- 
pared. Hence the situation above is modeled by the 
joint state ur^^ — '^^eEP^^^ ® Z^^' where Sx is a deter- 
ministic state of Alice's system with ^^^(a;) = 1. Bob's 
marginal state is ui^ — J2x£ePxPx- By Lemma [2l 
H{u;^^) = H{A) + ExeEP'-HiPx). Hence, 



I{A:B) = H{A) + H{B) - H{AB) 
= H{A) + H{B) - (^{A) - 

= H{oj^)-J2p.H{/3x)^ 

xeE 



xeE 



PxH{px) 



Accordingly, the content of the Holevo bound is simply 
that the mutual information between the measurement of 
Alice's classical system and any measurement on Bob's 
system is no greater than I{A : B), 

I{E : F) < I{A : B). 



While this is certainly natural, it does not always hold. 



Example 5 (Failure of the Holevo bound.) Let A be a 
classical bit, A = {{0, 1}} and B a squit, B — {F — 
{fTf'}jG = {g,g'}}, and consider the state 





/ 


/' 


9 


g' 





1/2 





1/2 





1 


1/2 








1/2 



It is easy to check that H{A) = H{AB) ^ Hg{B) = ] 
and H{B) = Hf{B) = 0. Hence, 

I{A : B) = H{A) + H{B) - H{AB) = 1 + 0-1 = 



I{E :F) = 1 + 1-1 = 1>0. 



Both strong subadditivity and the Holevo bound are 
instances of a more basic principle. The data processing 
inequality (DPI) asserts that, for an y sy stems A and B 
and any physical process £ : B ^ C.|47| 

I{A : £{B)) < I{A : B). 

The strong subadditivity of entropy amounts to the 
DPI for the process that simply discards a system (the 
marginalization map BC —>■ C) . The Holevo bound is the 
DPI for the special case of measurements, which can be 
understood as processes taking a system into a classical 
system which records the outcome. 

It seems reasonable that discarding a system, or per- 
forming a measurement, should be allowed processes in 
a physical theory. But a notion of mutual information, 
according to which discarding a system, or performing a 
measurement, causes a gain of mutual information seems 
bizarre. So it is an attractive idea that a physical the- 
ory should allow at least some definition of entropy and 
mutual information such that the corresponding DPI is 
satisfied. 



INFORMATION CAUSALITY 



In 27[, Pawlowski et al. define a principle they call 
Information Causality in terms of the following protocol. 
Alice and Bob share a joint non-signaling state, known 
to both parties. Alice receives a random bit string E of 
length N, makes measurements, and sends Bob a mes- 
sage F of no more than m bits. Bob receives a random 
variable G encoding a number k ~ 1,...,N, instructing 
him to guess the value of Alice's fcth bit Ek- Bob there- 
upon makes a suitable measurement and, based upon its 
outcome, and the message from Alice, produces his guess, 
Information causality is the condition that 



N 

E 



I{Ek:bk\G^k) <m. 



(9) 



The main result of [23 is that if a theory contains 
states that violate the CHSH inequality ll| by more 
than the Tsirel'son bound then it violates informa- 
tion causality. In particular, if Alice and Bob can share 
PR boxes, then using a protocol due to van Dam [sl ]. 
they can violate information causality maximally, mean- 
ing that Bob's guess is correct with certainty, and the 
left hand side of Equation ^ is N. Pawlowski et al. 
also give a proof, using fairly standard manipulations of 
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quantum mutual information, that quantum theory does 
satisfy information causahty. 

Having seen how to define notions of entropy and mu- 
tual information for general systems, it is interesting to 
consider where Pawlowski et aZ.'s quantum proof breaks 
down for some non-quantum systems such as PR boxes. 
One issue is that the proof uses strong subadditivity. As 
the following subsection shows, in the case where a PR 
box is the shared state, the van Dam protocol itself pro- 
vides an example of the failure of strong subadditivity of 
the measurement entropy. Section IV Bl provides a con- 
verse result. Any theory which is monoentropic, strongly 
subadditive and where the Holevo bound holds, must sat- 
isfy information causality. 

First, a few words about how to describe this setting in 
our terminology. Let Alice and Bob share two systems A 
and B, where each of these, as usual, has an associated 
test space. The joint test space of AB is immaterial, 
as long as it includes the Foulis- Randall product (i.e., 
allows all the separable measurements). The bit strings 
E and F are regarded as classical systems in their own 
right and the joint test space for a classical and a general 
system is, as always, assumed to be the Foulis-Randall 
product. Systems A and B begin the protocol in some 
joint non-signaling state uj^^ . 



A. The van Dam protocol. 



Consider a special case of the protocol described above, 
in which Alice and Bob share a PR box. Alice is supplied 
with a two-bit string E — E1E2, and transmits one bit 
F to Bob. Let the PR box be a state of two systems 
A and B, where A and B are squits corresponding to 
the test spaces {{oi, a'^}, {02, Oj}} and {{bi,b[}, {62, ^2}} 
respectively. The joint state of A and B is 





ai 


«'i 


0,2 


a'2 


bl 




1/2 




1/2 


b[ 


1/2 




1/2 




62 




1/2 


1/2 




b'2 


1/2 






1/2 



It can be verified that these outcome probabilities are 
indeed the PR box correlations, violating the CHSH in- 
equality maximally. In van Dam's protocol, Alice deter- 
mines the parity, Ei®E2 (where © denotes addition mod 
2). If this is zero she performs the {ai, a'i\ measurement 
on her system; if it is 1, she performs the {02,02} mea- 
surement. She then sends Bob a single bit with a value 
equal to the parity of her outcome and Ei (where un- 
primed outcomes correspond to and primed outcomes 
to 1). Bob can then determine the value of Ei by mea- 
suring {61, 6']^}, or the value of E2 by measuring {62, ^2}- 



Consider now an intermediate stage in this protocol, 
at which Alice has measured her system, and sent the bit 
F to Bob, who has not yet measured his system. Bob has 
access to systems B and F, but does not know the out- 
come of Alice's measurement. Hence consider the joint 
state of EFB, averaged over the outcomes of Alice's mea- 
surement. This is easily verified to be 



El E2F 


bl 


K 


62 


b'2 


000 


1/8 




1/8 




001 




1/8 




1/8 


111 


1/8 




1/8 




no 




1/8 




1/8 


010 


1/8 






1/8 


oil 




1/8 


1/8 




101 


1/8 






1/8 


100 




1/8 


1/8 





Minimizing over the possible measurement choices on B, 
H{Ei,F,B) = H{E2,F,B) ^ H{F,B) ^2. 

But clearly, H{E, B) = 3, so 

I{Ei : E2\F,B) = 2 + 2-2-3 = -KO. 

B. Theories satisfying information causality 



As the previous subsection observes, the van Dam pro- 
tocol involves a joint state on a classical- nonclassical com- 
posite system, which does not satisfy strong subadditivity 
of entropy. This is enough to prevent the proof of infor- 
mation causality going through. This subsection proves 
a converse result. 

Theorem 4 Suppose that a theory is 

1. monoentropic, meaning that measurement entropy 
equals mixing entropy for all systems. 

2. Strongly subadditive 

3. Satisfies the Holevo bound. 

Then the theory satisfies information causality. It follows 
that any theory satisfying these conditions cannot violate 
Tsirel'son's bound. 

Note that, as discussed in Section ITV C[ the second and 
third conditions both follow from a single assumption 
of a data processing inequality. Note also that in prov- 
ing Theorem [31 the condition that a theory be monoen- 
tropic is only used to establish the technical condition 
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that H{A\B) > when A is classical. So the theo- 
rem would still be valid if the monoentropic assumption 
were replaced by a direct assumption that for classical A, 
H{A\B) > 0. Otherwise, begin with 

Lemma 4 Suppose that a theory is monoentropic and 
that A is a classical system. Then H{A\B) > for any 
system B. 

Proof. (Lemma m) Suppose that A is a classical sys- 
tem, and that the joint state of AB is u)^^ . If the mea- 
surement and mixing entropies are equal, then Lemma [2] 
immediately gives 

S{AB)^S{A)+J2pxS{/3:,), 

X 

where px = lo^{x) and [3x is the state of B conditioned 
on X. Recall that the mixing entropy of a state is defined 
in terms of an infimum over convex decompositions into 
pure states. For a fixed e, call a convex decomposition 
of a state u> into pure states e- optimal if the Shannon 
entropy of the coefficients is < S{ijj) -f e. For any e > 0, 
there is an e-optimal decomposition. Let 

Px — ^ ^ Qy\xPxy 
V 

be an e-optimal convex decomposition of Px into pure 
states Pxy It follows that 

= ^^Pxqy\xPxv 

X y 

is a (possibly far from optimal) convex decomposition of 
uj^ into pure states. Hence S{B) is less than or equal to 
the Shannon entropy of the coefficients on the right hand 
side. Therefore 

S{B) < H(px) + Y,PxH{qy\x) 

X 

< S{A)+Y,Px{S{(3x) + e) 

X 

= S{AB)+e. 

Since this holds for any e, we have S{B) < S{AB) and 
SiA\B) = S{AB) - S{B) > as required. □ 

Given LemmalU the proof of Thcorem[4]is essentially a 
reconstruction of the quantum argument of Appendix A 
of [131, adapted to the broader setting of non-signaling 
states on test spaces. In its form the proof is the same, 
but great care must be taken at each step to ensure that 
the relevant properties of entropies and mutual informa- 
tion still hold. Many of the steps still go through in 
virtue of generic properties of the measurement entropy. 
The explicit assumptions of Theorem |4] are needed for 
the rest. 



Proof. (Theorem m) Assume that Alice and Bob share 
a joint system AB. Consider the A^-bit string which Alice 
receives as a classical system E, and consider the m-bit 
message which Alice sends to Bob as a classical system 
F. Let Ek denote Alice's fcth bit. Consider the stage 
of the protocol where Alice has measured system A, and 
sent F to Bob, but Bob has not yet measured system 
B. Bob has control of systems F and B at this point, 
and does not know the outcome of Alice's measurement. 
Hence the strategy is to consider the joint state of the 
systems E, F and B, averaged over Alice's outcomes. 

The first goal is to show that the joint state at this 
point satisfies 

I{E : FB) < TO. (10) 

By the fact that the initial state of AB is non-signaling, 
E is independent of B. Therefore Corollary [2] yields 
I{E : B) = 0. Using this, along with the definitions 
and straightforward algebraic manipulation, we get 

I{E:FB) = I{E:B)+I{E:F\B) 
= I{E : F\B) 
= I{EB : F) -I{B : F). 

By Theorem [3l mutual information is non-negative, so 

I{E : FB) < I{EB : F). (11) 

Now, I{EB : F) H{EB) + H{F) - H{EFB) 
H{F) - H{F\EB). By the assumption that the theory 
is monoentropic, and LemmalU H{F\EB) > 0. So 

I{EB : F) < H{F) < to. (12) 

This gives Equation (|10p . 

The next step is to establish 

N 

Y^I{Ek:FB)<I{E:FB). (13) 
fc=i 

Rearrangement of definitions yields 

I{E : FB) = I{Ei ...En ■■ FB) 

= I{Ei : FB) + I{E2 ...Em : FB\Ei) 

and 

/(i?2 ■ • ■ '■ FB\Ei) — 

/(i?2 . . . E]\[ : FBEi) — /(i?2 . . . E]\[ : Ei). 

Since the distribution on E is uniform (the bits are inde- 
pendent), /(i?2 . . . En : El) — 0. Hence, 

I{E : FB) = I {El : FB) + I{E2 ...En : FBEi). 

By strong subadditivity, 

I{E2 ...En : FBEi) > I{E2 ...En : FB). 
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So 

I{E : FB) < I{Ei : FB) + I{E2 ...En : FB). 
Applying this inequality recursively gives Equation (|13p . 

Finally, consider the last stage of the protocol. If Bob 
is instructed to guess the fcth bit, then, depending on the 
message _F, he measures system B. This can be seen as a 
single joint measurement Xk on the system FB. [i^ The 
Holevo bound, combined with Equations (|10ll3p gives 

N 

Y,I{Ek:Xu)<m. 

Finally, Bob outputs a guess bk for the value of Ek , where 
the guess depends on k and on the outcome of the mea- 
surement Xk. The usual data processing inequality ap- 
plied to classical mutual information yields 

N 

Y,I{Ek:hk\G = k)<m, 
fc=i 

which is information causality. □ 



VI. CONCLUSIONS, DISCUSSION AND 
FURTHER QUESTIONS 



We have defined preparation and measurement based 
generalizations of quantum and classical entropy and mu- 
tual and conditional information, and studied some of 
their basic properties. We called theories in which they 
coincide monoentropic, and showed that if they in ad- 
dition satisfy the data processing inequality (or at least 
its corollaries strong subadditivity and the generalized 
Holevo bound), Pawlowski et al.^s information causality 
principle holds. By their remarkable result that any cor- 
relations violating the Tsirel'son bound can be used to vi- 
olate information causality, it follows that monoentropic 
theories satisfying data processing must, like quantum 
theory, obey the Tsirel'son bound. Monoentropicity, is 
a strong constraint on theories, as we have shown by es- 
tablishing that it fails for all polytopes except simplices. 

Our results indicate that it is interesting and profitable 
to develop notions of entropy, and allied notions of condi- 
tional entropy and mutual information, for abstract prob- 
abilistic models. This paper should be regarded as only 
a preliminary exploration of this possibility. 

A natural direction for further research is to study data 
compression and channel capacities in the abstract set- 
ting of this paper. It is natural to seek a measure of 
entropy that governs the rate of high-fidelity data com- 
pression, as Shannon and von Neumann entropy do in 



classical and quantum theory. A first step toward ex- 
ploring classical channel capacities in generalized prob- 
abilistic theories might be to identify sufhcicnt condi- 
tions for the Holevo bound to hold. This is related to 
the issue of finding an operationally motivated defini- 
tion of mutual information. Arguably, a properly mo- 
tivated notion of mutual information should manifestly 
be monotonic. Of course, the monotonicity of quan- 
tum mutual information — equivalently, the strong sub- 
additivity of quantum entropy — is not manifest from its 
usual functional form. Still, the outright failure of the 
measurement-entropy-based mutual information to sat- 
isfy monotonicity in some cases raises a question as to its 
significance. Although in such cases measurement-based 
mutual information cannot be used to establish infor- 
mation causality through a proof parallel to Pawlowski 
et aZ.'s quantum proof, it could be that IC neverthe- 
less holds in some such cases. One should be cautious, 
though, about dismissing natural generalizations of clas- 
sical quantities on the grounds that they fail to satisfy 
intuitively compelling properties. A case in point is the 
history of skepticism, based on the fact that it can be neg- 
ative, about the operational significance of conditional in- 
formation in quantum information theory. It was known 
for many years that the conditional mutual information 
can be negative, but it was eventually shown to have an 
operational interpretation, involving the rate for quan- 
tum state merging protocols. It is also good to keep in 
mind that different operational motivations might turn 
out to be naturally associated with different entropic 
quantities, each with reasonable claim to be called mu- 
tual information. 

At a more fundamental level, one would like to un- 
derstand better the operational significance of various 
notions of entropy for abstract probabilistic models and 
theories. It is likely that the entropic quantities we have 
discussed here, measurement and mixing entropy, will 
turn out not to be best notions of entropy to use in many 
situations. For example, in Appendix O we considered a 
variation (or perhaps better, a specialization) of the no- 
tion of measurement entropy that is more tightly coupled 
to the geometry of the state space. 

We have seen that, taken together, the conditions of 
monoentropicity, strong subadditivity, and the Holevo 
bound, imply information causality. It is not out of the 
question that some subset of these conditions would suf- 
fice (especially since we need only very special cases of 
strong subadditivity). Alternatively, it would be of inter- 
est to find a single, reasonably simple physical postulate 
that would imply all three of these conditions. It seems 
plausible that such a postulate exists. On the one hand, 
strong subadditivity and the Holevo bound are both spe- 
cial cases of the data processing inequality, which in turn 
can be derived (as we will detail in a future paper) from 
the assumption that arbitrary processes can be dilated 
to reversible ones. On the other hand, as we show in 
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Appendix B, monoentropicity can be derived from con- 
ditions of a similar flavor, involving the dilatability of 
mixed states to pure states with a "marginal steering" 
property. Another avenue to explore is the consequence 
of monoentropicity that is needed for the IC proof: posi- 
tivity of conditional information when a classical system 
is conditioned upon a general one. Although its opera- 
tional interpretation is not evident at first blush, it war- 
rants further study. 

We hope to discuss all of these matters in detail in a 
future paper. 
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APPENDIX A: NON-CONCAVITY OF MIXING 
ENTROPY 



In this appendix we prove Theorem [21 which states 
that the mixing entropy is not concave for nonsimphcial 
polytopes. As a preliminary to the proof, we state some 
basic definitions and facts that we wiU use. A face of a 
convex set C, which is a set F C C such that every x G C 
that can appear in a convex decomposition of something 
in F, is also in F. A maximal face of C is one that is not 
a proper subset of any face in C other than C itself. An 
exposed face of C is a subset of C that is the intersection 
of C with a hyperplane supporting it (such a subset is 
easily shown to be a face). All faces of a polytope are 
exposed, and the maximal ones have affine codimension 
1, i.e. their spans are affine hyperplanes. We denote the 
affine space generated by a set S by aff(S'), the linear 
span of S by lin(S'), and the cone generated by S (i.e. 
the set of nonnegative linear combinations of elements of 
S) by cone(S'). Note that when a subset of a real vector 
space contains 0, afF(5') — hn(S'). The relative interior 
of a convex compact set C is the interior of C when it 
is considered as a subset of afF(C). Finally, we'll use the 
term Z-ball, where Z is an affine subspace of the ambient 
vector space, to mean a subset of Z that is a ball in Z. 

The proof relies on the following lemma, proven below. 



Lemma 5 The mixing entropy fails to be concave for 
any d- dimensional nonsimphcial polytope all the maxi- 
mal faces of which are (d-1) -dimensional simplices. 

We begin by proving the theorem. 
Proof. (Theorem). First note that any counterexample 
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to concavity of the mixing entropy in a polytope S will 
also be a counterexample in a polytope S' that has 5 as a 
face. This follows from the fact if S' is a face, only states 
in S can appear in convex decompositions of states in S. 
The proof of the theorem is by induction. 

Suppose as our induction hypothesis that the mixing 
entropy fails to be concave for nonsimplicial polytopes 
in dimension d. For every polytope in dimension d + 1 
either (i) every maximal face is simplicial, or (ii) there is 
a maximal face that is nonsimplicial. If case (ii) applies, 
then there is a face that constitutes a nonsimplicial poly- 
tope of dimension d and by our induction hypothesis, the 
mixing entropy fails to be concave for this face. If case 
(i) applies, then the polytope satisfies the conditions of 
the lemma and the mixing entropy fails to be concave by 
virtue of the lemma. 

To complete the inductive argument we need to show 
that the mixing entropy fails to be concave for nonsim- 
plicial polytopes in dimension d = 2, the lowest dimen- 
sion in which there exist nonsimplicial polytopes. This 
follows from the fact that all of the maximal faces of a 
2-dimensional nonsimplicial polytope are line segments, 
which are simplices, so that the conditions of the lemma 
apply. □ 

We now prove the lemma. 

Proof. (Lemma) . Suppose S* is a d-dimensional polytope 
that satisfies the conditions of the lemma, that is, it is 
nonsimplicial, but all of its maximal faces are simplicial. 
In this case, one can always find two maximal faces {{d — 
l)-dimensional simplices), Fi and F2, whose intersection, 
FinF2, is a ((i — 2)-dimensional simplex. We define Vi to 
be the vertex of Fi that is not contained in H F2 . V2 is 
defined similarly. Let pi be the barycenter of Fi , p2 the 
barycenter of F2 and ps the barycenter of Fi n ■ The 
figures provide examples of pairs of such faces in different 
dimensions. 




FIG. 7: Example of failure of concavity for a 2d nonsimplicial 
polytope. 




FIG. 8: Example of failure of concavity for a 3d nonsimplicial 
polytope. Here p is in the interior of H. 




FIG. 9: Example of failure of concavity for a 3d nonsimplicial 
polytope. Here p is on the boundary of H. 



Let I^ be a vertex of 5* that is not contained in Fi or 
in F2. Such a vertex always exists because if it did not, 
then the total number of vertices in S would be 0?+ 1 and 
S would be a simplex, contrary to hypothesis. 

Define the {d — l)-dimensional polytope H to be the 
convex hull of Fi n F2 and V. Note that H is a. simplex. 

Define T to be the triangle with vertices pi, p2 and p^. 

Define L to be the intersection of T and the (d — 1)- 
dimensional polytope H. L is a line segment; we defer 
establishing this to the end of the proof, since it is some- 
what technical. 

Finally, we define the state p for which the mixing 
entropy will fail to be concave. It is defined as the second 
vertex of L, that is, L is the line segment extending from 
P3 to p. 

Now the proof proceeds differently depending on 
whether p is in the interior or in the boundary (relative 
to aff(i7)) of H. 
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i) p is in the relative boundary of H . 

In this case, p lies on a face of H. Because _ff is a sim- 
plex of dimension d—1, every such face has d—\ vertices 
and consequently the mixing entropy of p satisfies 

S{p)<\og{d-l). (Al) 

By definition, p G T, so that it is a convex combination 
of pi, p2 and p3, 

P = PlPl +^2/52 +P3P3, (A2) 

where the pi form a probability distribution. Because pi 
(P2) is the barycenter of Fi {F2), which has d vertices, 
its mixing entropy is 

S{pi) = S{p2) = \ogd. (A3) 

while because is the barycenter of Fi n F2 with d — 1 
vertices, we have 

^(P3)=l0g(d-1). (A4) 

Recalling that L is 1-dimensional, we know that p ^ p^, 
or equivalently, pi + P2 > 0, which implies that 

3 

Y,P^S{p^)>\og{d-l). (A5) 

From Eqs. ([Al]) and (jASj) we infer that 

S{p)<J2p^Sip^), (A6) 

i 

that is, the mixing entropy fails to be concave. 

ii) p is among the relative interior points of H. 
In this case, 

P^PlPl+P2P2, (A7) 

that is, p lies on the line segment defined by pi and p2- 
The proof of (|A7|) is by contradiction. Suppose that p 
lies in the relative interior of T as well as in the relative 
interior of H. Then, there is an aff(r)-ball Bi around p 
contained in the relative interior of T and an aff(7?)-ball 
around p contained in the interior of H. Bi n i?2 is a 
line segment Lp C L with midpoint p. But the fact that 
p is the midpoint of Lp contradicts the fact that it is an 
extremal point of L. 

It follows from Eq. (|A7p and the fact that pi , p2 are 
barycenters of d — 1-dimensional simplices that 

Y,P^S{P^) {=PlS{pi)+P2S{p2)) = log d. (A8) 



Next, we show that p cannot be the barycenter of H. 
We begin by demonstrating that p is the barycenter of 



H' = conv {Fi n F2, V) , where V = piVi + P2V2 and 
where conv {S, S') is the convex hull of 5 ft S'. Letting 
bary (F) denote the barycenter of F, the proof is as fol- 
lows, writing Xi,i £ {l,...,d — 1}, for the d—1 vertices 
of Fi nFs. 

p = pibary (Fi) +p2bary(F2) (A9) 

= Pl^(E^'+^l)+^2^£^*+^2) (AlO) 

2=1 i=l 
^ d-1 

= ^(^X, +pil/i-Hp2l^2) (All) 
i=l 

= bary (conv (Fi nF2,pit/i +P2V^2)) (A12) 
= bary(ff')- (A13) 

However, H' = conv (Fi fl F2, F') has a different 
barycenter from H = conv (Fi H F2,V) because V is dis- 
tinct from V' . The latter follows from the fact that V, V\ 
and V2 are all vertices of the nonsimplicial polytope S*, 
and consequently V cannot be in conv(Fi n F2, Vi, V2), 
unlike V which is. 

Given that p E H but is not at its barycenter, and 
given that H has d vertices, we have 

S'(p)<logd. (A14) 

From Eqs. (jA8[) and (|A14p we infer the failure of concav- 
ity of the mixing entropy. 

We finish the proof by establishing the claim that L, 
defined above, is a line segment. First note that be- 
cause it is an intersection of convex compact sets, it 
is compact and convex. Because dim(aff(T)) = 2 and 
dim(aff(i7)) ^ d - 1, aff(T) n afr(ff) is one or two di- 
mensional. For it to be two dimensional would require 
T C aff(i?), implying pi,p2 G a,S{H), and hence since 
Fi n F2 C H, that Vi,y2 lie in the hyperplane aff(iJ). 
That contradicts the assumption that Fi , F2 are distinct 
maximal faces. 

Since L C aff(T) fl aS{H), L is at most one- 
dimensional. To show it is at least 1-dimensional we be- 
gin by observing that because they are subsets of S, both 
T and H lie in the "tangent wedge" W to S a,t p^, i.e. 
the intersection of the halfspaces aff(Fi)_|_ and aff(F2)+. 
Here aff(Fj)+ is defined to be the closed half-space to 
the polytope S"s side of aff(Fj). In fact, V lies in the 
interior of W because if it lay in aff(Fi) or aff(F2), our 
assumption that all maximal faces were simplices would 
be violated. Viewing p3 as the origin of a real linear 
space, and noting that linT and lin(Fi H F2) are com- 
plementary subspaces (they span the space and intersect 
only at 0, i.e. ps) we can decompose V in a unique way 
into a component in lin(r) and a component in the edge, 
lin(Fi n F2), of the tangent wedge. 

Let q be the linear projection with kernel lin(Fi n F2) 
and image lin(r). For any set X such that X = X + 
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lin(i^i n F2), q{X) = lin(r) n X. Both W and &S{H) 
satisfy this condition. As already noted, V £ int W; 
this is equivalent to q{V) being in the relative interior of 
cone(T). Since V G H, q(y) is also in aS{H). Therefore 
cone(T) n aff(iJ) is an interior ray r of cone(T). Now, 
since ps is the barycenter of Fi Ci F2, there is a d — 2- 
dimensional aff(i^i ni^2)-ball Bq around contained en- 
tirely in Fir\F2. Furthermore, Bq is the intersection of a 
d — 1-dimensional aff(-ff)-ball B around pa, with Fi ni^2- 
afF(Fi n F2) divides a.S{H) into halfspaces, and the H- 
side half-ball B\Bq is contained entirely in the relative 
interior of H. Since, as established near the beginning 
of our argument, aff(r) n aS{H) is a line in aS{H), and 
we now know that while it contains ps it is not entirely 
contained in Fi Ci F2, it must intersect B\Bo. Its inter- 
section with B\Bo is contained in H. By choosing B 
small enough, we can ensure that this intersection is also 
contained in T. This is obvious from two-dimensional ge- 
ometry. To be slightly more explicit, the facts that r, i.e. 
the half of &S{H) n aff (T) on the H-side of aff (i^i n F2), 
is interior to cone(T), cone(T) is generated by T, and T 
is closed under multiplication by scalars in [0, 1], ensure 
this. Since aff(r) n aff (if) n B is contained in both H 
and T, it is contained in L; since it is one-dimensional, 
so is L and so, since i is a compact convex set, L is a 
line segment. □ 

In generalized theories we can define (cf. also 0| , where 
analogous quantities for convex-sets-based theories were 
defined, and their failure to be concave in general was also 
observed) measurement-entropy-like quantities Ht based 
on any function T that (like entropy) is Schur-concave 
and defined on finite lists of classical probabilities. For 
a state p, Ht{p) is defined as the infimum over tests of 
the value of T on the probabilities for the results of the 
test. We define Ud for positive integers d as the uniform 
distribution on d alternatives. The same proof as before 
(with F{Ud) in place of logd) gives us: 



Proposition 1 For any T whose value on Ud+i is 
strictly greater than its value on Ud, for all d, for ex- 
ample any strictly Schur-concave T , the only polytopes 
on which Ht is concave are simplices. 



tion causality. In this Appendix, we explore some further 
consequences of monoentropicity, and also suggest some 
other postulates, the physical content of which may be 
clearer, that enforce this property. 

It will be helpful to impose some mild restrictions on 
the models we consider. (These are satisfied by all of 
the examples discussed earlier.) First, we want to have 
enough analytic structure to guarantee that measure- 
ment entropies will be well-behaved. Accordingly, in this 
Appendix we shall require of all models A = (21, fi) that 

be a compact, finite-dimensional convex set, as already 
assumed in Section |TT1 additionally, we make the follow- 
ing technical, but reasonable and fairly weak, assump- 
tions: 



(i) The total outcome-set X is compact in some Haus- 
dorff topology that makes every state a € fi"* con- 
tinuous as a function a : X ^ [0, !]• 

(ii) Write a; _L y to mean that outcomes x and y are 
distinct and jointly testable, i.e., there exists a test 
i? e 2t containing them both. We require that _L 
be closed as a subset of AT x X . 

(iii) 2t is compact in the standard topology it inherits 
from X (as explained below). 



Conditions (i) and (ii) have a certain a priori plausibility, 
and, indeed, are often satisfied in practice: see [s^ for 
examples of large classes of test spaces satisfying them. 
Condition (iii) requires some further justification. Condi- 
tions (i) and (ii) make 2t a topological test space [35, 3^. 
With X compact, as in condition (i), 21 has finite rank 
[35t . Lemma 204. This allows us to topologize the set 2t 
of tests as a quotient of a suitable subspace of X", where 
n is the rank of 21. We call this the standard topology on 
21. One can show (jlHl, Prop. 211) that 21 can be en- 
larged so as to become compact in this topology, without 
change to its rank or to its space of continuous states. 
So condition (iii) is relatively harmless. In fact, if 21 is 
uniform, meaning that all tests have the same number of 
outcomes, condition (iii) is automatically satisfied, given 
(i) and (u). 



APPENDIX B: ENTROPY AND QUANTUM 
AXIOMATICS 



That mixing and measurement entropies coincide, as 
they do in classical and quantum theory, has powerful 
consequences for the structure of a probablistic model 
and, perhaps even more profoundly, for the structure of 
a probabilistic theory. As already noted, it implies that 
mixing entropy is concave, which places sharp restric- 
tions on the geometry of state spaces. It also figures 
importantly in our derivation, in Section V, of informa- 



It is not difficult to show that He{ol) is continuous 
as a function of _E G 21 (see [35|, Lemma 210). Hence, 
for every state a G fi, there exists a test E for which 
Hsia) — H{a). This justifies the assumption to this 
effect made in Section Hill 

We can now characterize those states having zero mea- 
surement or mixing entropy. 



Lemma 6 Let a be a state of a system A = (21, Q) sat- 
isfying the standing assumptions just discussed. 
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(a) H{a) = ijf there exists an outcome x G X with 
a{x) = 1. 

(b) If S{a) — 0, then a is the limit of a sequence of 
pure states of Q,. 

Proof, (a) "if" follows immediately from the definition, 
with E any test containing x; "only if" from the fact 
(established just above the statement of the Lemma) that 
H{a) — HE{a) for some G 21. For (b), note that 
if p = (pi,...,p„) is a discrete probability distribution 
with H{p) < e, then max{pi} > 2~^ Now if S{a) = 
0, we can find, for any sequence decreasing to 0, a 
sequence of pure-state ensembles {pi,fcai,/c|i = 1, ....,nfc} 
for a (so that a — X]"=i Pi,kU,k for every k) with H{pk) < 
ek- Ordering each ensemble so that pi^k — niax{pi^/j|i = 
l,...,nfc}, we find, as above, that pi^k > 2"'^'=. Since 
a = Pi,kai,k + T,"^2PiMi,k, we have a > Pi^k^i^k in 
the pointwise order on X = 1J21; consequently, \\a — 
Pi,/cai,fe|| = (a - Pi,fcai,fc)(M) = 1 - _pi,fc. Thus, a = 
limfeai^fc. □ 

The converse to part (b) would trivially be true if the 
mixing entropy were continuous on the convex set f2. 
However, as Example [5] of the main text shows, it need 
not be. 

Call a model A — (21, il) unital iff for every outcome 
X e X := lj2t, there exists at least one state a with 
a{x) — 1. If this state is unique — and therefore pure — 
for every x, we say that A is sharp. In this case, we write 

for the unique state with exix) = 1. In the literature 
of quantum axiomatics, sharpness has sometimes been 
taken as an axiom (sometimes called Gunson's Axiom) 
[l3|. Lemma [6] has the following corollary: 

Corollary 3 Suppose A is monoentropic, and that the 
set of pure states in A is closed. Then A is sharp. 

Proof. If a is a pure state, then H{a) = S{a) = 0. By 
Lemma [51 there exists a measurement outcome x with 
a{x) = 1. On the other hand, if a{x) = 1, then S{a) — 
H{a) = 0, whence, again by Lemma [6l a is the limit of 
a sequence of pure states, say e„ a. By assumption, 
the set of pure states is closed, so a is pure. Since the 
set of states assigning unit probability to x is convex, it 
follows that a is the unique such state. □ 

While the condition that the set of pure states be 
closed is not totally innocent (consider, e.g.. Example [2] 
above), neither is it unreasonable. For example, it will be 
satisfied if there exists a compact group of symmetries of 
the state space that acts transitively on the pure states. 

The condition that measurement and mixing entropies 
coincide also places some constraints on how systems 
compose: 



7 Suppose that AB = (£, n^^) is a composite 
(in the sense of Section\I^ of systems A — (21, £7'^) and 
B = (05, il^). Suppose that A, B an AB have closed sets 
of pure states, and are monoentropic. IfVL^^ contains an 
entangled pure state, then £ must contain a non-product 
outcome. 

Proof. By the previous Lemma, A, B and AB are sharp. 
If X e 1J21 and y £ IJ *B are outcomes of A and B, 
respectively, and e^, Cy and txy are the unique pure states 
making x, y and xy certain, then Cxy = tx ® Now 
if /9 is a pure entangled state in Q.^^ , then S{p) = 0. 
If iJ = 5', then H{p) = 0, whence, p = for some 
outcome z G IJ £. If z is a product outcome, say z = xy, 
then p = ex (E> Sy - Si contradiction. □ 

We now consider whether the condition that H — S 
can be derived from more physically transparent consid- 
erations. 

Definition 8 A probabilistic theory has the pure con- 
ditioning property iff, for every pair of systems A — 
(21,1}-^) and B = (»,fi^), every pure state uj of AB, 
and all outcomes x of A andy of B, the conditional states 
uj^^^ and uj^^y are pure. 

Lemma 8 If a theory satisfies the pure conditioning 
property, then for any pure bipartite state lo on a compos- 
ite AB, we have S{lo^) < H{uj^) and S{uj^) < H{uj^). 

Proof. Let be a pure bipartite state. Pick 
an observable E minimizing measurement entropy 
for uji, so that H(uji) is the Shannon entropy 
He{u}^) := -^^gBu;^(a;)log(w^(a;)). We have uj^ = 
'^x^E^^i.^)'^^^'' ■ -P^ {a.nd the assumption that uj 
is pure), the conditional states uj^^^ are pure. By def- 
inition, S{uj^) is the minimum Shannon entropy of the 
mixing coefficients in any pure-state ensemble for cu^ , so 
S{iLj^) < IIe{uj"^) — II{uj^). By the same argument, 
S{uj^) < H{lo^). □ 

Definition 9 A theory has the steering property iff, for 
every pair of systems A and B, every pure bipartite state 
UJ of AB steers its marginals, in the sense that for any 
convex decomposition uj^ = ^^Pif3i, with f3i pure and 
distinct from each other, there is a test E = {oi} of A 
with Pi = oj^l'^', and similarly for lo^ . 

The term "steering" is due to Schrodinger ^3Q||, who 
showed that quantum theory is steering; further proofs 
and extensions are in Hadjisavvas and Hughston, 
Jozsa, and Wootters |21i] ; a survey is ^] . 

Lemma 2 If a theory has the steering property, then for 
every pure bipartite state uj, II{uj^) < 5'(w"^). 
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Proof. For any e > 0, choose a convex decomposition 
Lu^ = "^^iPiPi of i'^to states /3i, with S{lo^) > 
H(j)i) — e. Since the state w is steering, there exists a 
test E — {xi} with cj-^l^' ~ j3i, whence pi — uj^{xi). It 
foUows that Siuj^) > - J2.^Pr logfe) - e = ffis(w^) - e. 
bmce e is arbitrary, > □ 

Definition 10 ^4 pure state a in an abstract probabilistic 
theory is purifiable iff for every state a on a system A, 
there exists a pure bipartite state lu - a purification of a - 
on a composite AB, with B a copy of A, with uj^ — uj^ — 
a. An abstract probabilistic theory has the purifiability 
property iff every state in the theory is purifiable. 

Quantum mechanics has the purifiabihty property. 
D'Ariano et al. have considered a condition very 

similar to purifiabihty as a potential axiom for quantum 
theory, and have shown that many other features of quan- 
tum theory follow from it. From the Lemmas above, we 
have 

Proposition 2 A theory that has the pure conditioning, 
steering and purifiability properties is monoentropic. 



APPENDIX C: LINEARIZED TEST SPACE 
MODELS, ORDERED LINEAR SPACE MODELS, 
AND ENTROPY 



The apparatus of states on test spaces can be lin- 
earized, as follows, li A — (2t, f2), with total outcome 
space X = [j% let V{A) denote the span of Q in R-^, 
regarded as an ordered real vector space with positive 
cone V+{A) = {a G V{A)\a{x) > OVx e X}. Every 
outcome x d X defines a positive linear evaluation func- 
tional fx e V*{Q) by fxifJ.) = fi{x) for all fi e V{A). 
Moreover, one has J2x£E fx = where u is the unique 
functional taking the constant value 1 on ^l. Abstract- 
ing, one defines an effect to be a positive linear functional 
/ G V*{A) with < /(a) < 1 for all a e fl (equivalently, 
< / < w); an observable on A is a sequence /i, /„ of 
effects with fi = u. 

From this point of view, the structure of the test space 
is essentially a privileged set of observables - an addi- 
tional structure that (like a preferred basis for a vector 
space) may or may carry some useful information, or may 
simply be a computational convenience. For example, 
if A(H) = (5(H), rj(H)) is a quantum system, V{A) is 
the space of quadratic forms associated with - but one 
might as well say, the space of - Hermitian operators on 
H, and V* is essentially the same space, under the du- 
ality a{p) = Tr(pa). In particular, an effect is a positive 
operator between and 1, and an observable is essen- 



tially a discrete POVM. The convex sets, or ordered lin- 
ear spaces, formalism takes this kind of combination of a 
convex state space and a set of effects in the dual cone to 
the state space, as primary. Roughly, a convex model is 
defined by taking a convex compact set of states as a base 
for a cone 1/(51)+ of unnormalized states, and a cone of 
"unnormalized allowed effects" that is a closed subcone 
V_^, containing u in its interior, of the dual cone V*{Vl)+ 
of all effects, u is defined by the condition u{Vl) = 1, 
and the interval [0, u] according to the ordering defined 
by is the set of effects allowed in the theory. When 
= F(J7)_|_, the model is called maximal (or sometimes 
saturated [10|). If the model is constructed from a test 
space, one will usually want to choose T^** to contain the 
effects associated with all outcomes in the test space. 

Two natural distinguished classes of effects are the 
ray-extremal ones, that is effects that lie on extremal 
rays of the cone generated by effects, and atomic effects, 
i.e., maximal effects in extremal rays (equivalently, ray- 
extremal effects that are also extremal in the convex set 
[0, m] of effects). We may define the measurement en- 
tropy as the infimum of entropies obtainable by measur- 
ing observables consisting of ray-extremal elements, or 
alternatively as the infimum of entropies obtainable by 
measuring observables consisting of atomic effects. Intu- 
itively, the observables consisting of ray- extremal effects 
are maximally fine-grained. Ray-extremal effects cannot 
be further refined by decomposing them as sums of other 
effects. Although they can be decomposed as sums of 
shrunken versions of themselves, intuitively this cannot 
provide any additional information about the system be- 
ing measured. Certainly in the case of atomic effects, 
and probably with some care and relabeling in the case 
of ray-extremal effects (which unlike atomic effects may 
appear more than once in a given observable), the mea- 
surements with such outcomes can be organized into dis- 
tinguished test spaces associated with a given convex-sets 
model, so the test space framework we use in the main 
text will probably cover this natural possibility, although 
the additional assumptions we make to obtain particular 
results will need to be checked for these cases. In the 
case of ray-extremal effects, the infimum in the defini- 
tion of measurement entropy will likely not be changed if 
we omit measurements in which an effect appears more 
than once; the measurements without repetitions should 
be easier to organize into a test space. Linearization and 
the application of one of these definitions may well re- 
move pathologies in measurement entropy that are as- 
sociated with some test space/state space models. The 
spirit of the definition of measurement entropy via an in- 
fimum suggests excluding tests that are not maximally 
fine grained when viewed from the convex states per- 
spective, as the above definitions do. Passing to these 
definitions may also remove pathologies that might arise 
when the set of distinguished observables associated with 
tests has an irregular relationship to a state space whose 
underlying geometry is quite regular. 



